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PREFACE 


Nobel Prize-winning economists Amartya Sen and Joe Stiglitz, in collabo- 
ration with a number of co-authors of the internationally acclaimed report 
“On the Measurement of Economic Performance and Social Progress,” noted 
that: 


“Those attempting to guide the economy and our societies are like pilots trying to steering 
a course without a reliable compass. The decisions they (and we as individual citizens) 
make depend on what we measure, how good our measurements are and how well our 
measures are understood. We are almost blind when the metrics on which action is based 
are ill-designed or when they are not well understood. For many purposes, we need better 
metrics. Fortunately, research in recent years has enabled us to improve our metrics, and it 
is time to incorporate in our measurement systems some of these advances. There is also 
consensus among the Commission members that better measures may enable us to steer 
our economies better through and out of crises.” 


The German Data Forum (RatSWD) was founded to address these needs for 
more reliable statistics and better empirical research in Germany and beyond. 
The German Data Forum advises the German federal government and Ldnder 
governments on issues that impact the expansion and improvement of the 
research data infrastructure in the empirical social, behavioral, and economic 
sciences. Since it was established in 2004 by the German Federal Ministry of 
Education and Research (BMBF, Bundesministerium ftir Bildung und For- 
schung), the German Data Forum has significantly advanced the agenda set 
forth by the Commission to Improve the Information Infrastructure (KVI, 
Kommission zu Verbesserung der informationellen Infrastruktur zwischen 
Wissenschaft und Statistik) and has supported the work of research funding 
agencies by making recommendations on how the KVI agenda can be most 
effectively implemented. The German Data Forum has hereby helped make a 
wide range of high-quality, reliable microdata available to empirical 
researchers in the social, behavioral, and economic sciences at Research Data 
Centers and Data Service Centers throughout Germany. 

These data are enabling researchers to expand the frontiers of scientific 
knowledge. Viewed in isolation, findings from discrete research disciplines 
appear unspectacular; only on rare occasions do they yield a fundamentally 
new picture of the world or of society. It is for precisely this reason that 
patience and a long-term perspective are so crucial for research funding and 


support. Of the many new conclusions that have been developed on the basis 
of empirical data from the Research Data Centers, two groundbreaking 
findings can be cited as evidence of this: First, data from German pension 
insurance carriers have been used by several researchers to identify signi- 
ficant differences between male and female life expectancy depending on the 
level of education and corresponding differences in workplace health risks. 
Second, data from the Federal Labor Office, in which firm statistics were 
merged painstakingly with data on employment structures, have been used to 
show that exporting firms pay higher wages than non-exporting firms. This 
would be impossible to see from the raw statistical data, since exporting 
firms have a different product portfolio and personnel structure than non- 
exporters. 

The development and distribution of “Campus Files”, a noteworthy 
contribution to university education, is also among the achievements of the 
Research Data Centers and Data Service Centers established by German Data 
Forum and the German Ministry of Education and Research. By working 
with original statistical data, students obtain more advanced methodological 
training with greater practical relevance. This will undoubtedly pay off 
substantially in the years (and decades) to come — particularly when the 
graduates begin putting their statistical expertise to work professionally in 
such fields as policy analysis and market research. 

Despite the gains it has already made in expanding the research infra- 
structure, the German Data Forum is not content to rest on past achieve- 
ments. To the contrary, in 2008 it launched the project, “Developing the 
Research Data Infrastructure for the Social and Behavioral Sciences in 
Germany and Beyond: Progress since 2001, Current Situation, and Future 
Demands.” Building on its work from the last several years, the German Data 
Forum now aims to develop the research infrastructure even further, to 
ensure that it can meet future demands, and to identify emerging data needs 
in the German, European, and international contexts. The Federal Ministry of 
Education and Research will continue to lend its support in this important 
undertaking. 

The support of the Federal Ministry of Education and Research has made 
it possible to bring together over 100 renowned experts from a wide range of 
disciplines in an ongoing dialog. The results of this concentrated effort are 
compiled in the two-volume report “Building on Progress — Expanding the 
Research Infrastructure for the Social, Economic and Behavioral Sciences.” 
The nearly 70 advisory reports offer a detailed look at the situation from the 
perspective of various branches of the social, behavioral, and economic 
sciences in order to identify specific data needs. It is a comprehensive and 
systematic compendium designed for use by research organizations, funding 
agencies, and statistical offices. 
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Government policy alone cannot create optimal conditions for improving 
the research infrastructure. Dialog with the research community and the 
federal statistical agencies is critical. Acting as a platform for this dialog is 
one of the key tasks of the German Data Forum. The Federal Ministry of 
Education and Research looks forward to being a participant in this discus- 
sion. 


Berlin, November 2010 


Cornelia Quennet-Thielen 
State Secretary 
Federal Ministry of Education and Research (BMBF) 
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INTRODUCTION 


“Valid and reliable data are the indispensable foundation for research in the social sciences 
and economics: they ensure that research is in line with contemporary realities and provide 
convincing arguments for actions by citizens, policy-makers, and business leaders.” 


This is the opening sentence of the 2001 evaluation report by the German 
Commission on Improving the Information Infrastructure between Science 
and Statistics (KVI, Kommission zur Verbesserung der informationellen In- 
frastruktur zwischen Wissenschaft und Statistik), prepared on behalf of the 
Federal Ministry of Education and Research (BMBF, Bundesministerium fiir 
Bildung und Forschung).! Ten years later, this statement still holds: the pro- 
vision of valid and reliable data through a sophisticated and sustainable re- 
search infrastructure is an important task for both academic research and 
official statistical institutions, and will remain so in the years to come. 

The German Data Forum (RatSWD) was founded by the BMBF in 2004. 
Its origins, however, date back to 1999, when the BMBF appointed the KVI 
to submit a comprehensive report with recommendations to improve the Ger- 
man research infrastructure for the social and economic sciences. This report, 
published in 2001, still constitutes the basis for a large part of the work per- 
formed by the German Data Forum. Although the Forum's tasks have gradu- 
ally expanded, collaboration with the Research Data Centers and Data 
Service Centers, both of which have come into existence since the founding 
of the Forum, continues to form the backbone of its activities. However, 
since the publication of the KVI’s report, much has changed — and improved 
— in terms of data collection, preservation, access, and analysis. Thus, the 
time is ripe to systematically assess the progress made so far in Germany's 
information infrastructure and to discuss current challenges and future needs 
in the German, European, and international contexts. 

One of the key tasks of the German Data Forum is to offer informed ad- 
vice to the policy-makers, official data providers (especially state and federal 
statistical offices), and research funding bodies involved in building and 
running national and international statistical and research infrastructures for 
the social, economic, and behavioral sciences. To this end, the German Data 


1 Kommission zur Verbesserung der informationellen Infrastruktur zwischen Wissenschaft 
und Statistik (KVI) (Ed.) (2001): Wege zu einer besseren informationellen Infrastruktur. 
Baden-Baden, 37 [own translation]. See also the documentation of the recommendations: 
"Towards an Improved Statistical Infrastructure. Summary Report of the Commission set 
up by the Federal Ministry of Education and Research (Germany) to improve the statistical 
infrastructure in cooperation with the scientific community and official statistics", in: 
Schmollers Jahrbuch 121 (3), 443-468. 


Forum promotes dialog between, as well as within, academic research infra- 
structures and official statistical services. 

The German Data Forum has made a major step towards achieving these 
objectives by commissioning advisory reports from internationally recog- 
nized scholars in the social, economic, and behavioral sciences to debate the 
future expansion of research infrastructure. These 68 advisory reports, in 
addition to their executive summaries and the recommendations of the 
German Data Forum, have been released as a comprehensive double-volume 
compendium, entitled, “Building on Progress — Expanding the Research 
Infrastructure for the Social, Economic, and Behavioral Sciences.” Given the 
length and detail of the aforementioned compendium, the German Data 
Forum made the decision to additionally issue this synopsis, which contains 
recommendations derived from these advisory reports. This abridged version 
is intended to quickly provide interested readers with a concise overview of 
the current state of Germany’s research data infrastructure and what is 
required for its improvement. In addition, this short-version serves to provide 
policy makers, scientists, and research funding bodies with a précis of the 
German Data Forum’s deliberations with respect to the conceptual conditions 
for an internationally competitive research environment in Germany. Both 
this publication and the original double-volume compendium are available as 
open access documents at Barbara Budrich Publishers. 

One of the overarching goals of the recommendations of the German 
Data Forum — and of the German Data Forum itself — is to create optimal 
infrastructural conditions in Germany for innovative research both at 
universities and independent research institutes and within the system of 
official statistics and government research institutes. This requires that 
researchers in all these institutions be equipped with the capabilities and tools 
they need to create and access databases in Germany and abroad. A second 
and equally important goal is to create and cultivate a research environment 
that allows young scholars, official researchers, and official statisticians with 
innovative ideas to achieve their full potential. 

A vibrant, structurally sound, and highly productive research environ- 
ment cannot be created using a top-down approach: the impetus must come 
from the research community itself. Scholars as well as official statisticians 
and researchers need formal procedures that promote competition and allow 
research entrepreneurship to flourish. The recommendations contained in the 
first part of this publication seek to facilitate these processes by communi- 
cating the needs of scientific researchers and statisticians to policy-makers 
and by promoting dialog among the various institutions involved. 

The recommendations of the German Data Forum are based on the 68 
reports published in the original double-volume compendium. Their prepa- 
ration began in the summer and autumn of 2008 with two international 
workshops at which authors exchanged ideas with members of the German 


Data Forum. The intensive discussions that took place there regarding current 
challenges and future demands facing Germany’s research infrastructure 
revealed the need to include more fields than initially planned. By 2010, the 
original number of about 60 advisory reports had increased to almost 70. 
Together, these advisory reports form a compendium of recent developments 
and data infrastructure needs in numerous fields — not only in the economic 
and social sciences, but to some extent also in the behavioral sciences. They 
touch on an array of methodological, ethical, and privacy issues related to 
data collection, preservation, and access, and take recent European and 
international developments into consideration. 

Although the German Data Forum has attempted to make the range of 
topics covered in the commissioned advisory reports as comprehensive as 
possible, one cannot claim to have covered every issue of relevance to the 
German research infrastructure in the behavioral, economic, and social sci- 
ences; the infrastructure for public health research, for example, is not dis- 
cussed. Furthermore, since the majority of advisory reports upon which this 
synopsis is based were written in 2009, it should be noted that the infor- 
mation presented reflects the state of affairs at that point in time. In addition 
to being published in the German Data Forum’s aforementioned compen- 
dium, readers seeking the advisory reports will also find them as RatSWD 
Working Papers, all of which are available online. 

The original double-volume compendium is divided into three main 
parts. The first part presents the German Data Forum’s recommendations on 
the further development of the research infrastructure for the social, 
economic, and behavioral sciences. 

The second part of the original compendium provides executive 
summaries of all of the advisory reports, including more detailed recom- 
mendations on how to meet current and future data needs. The summaries 
serve to provide the reader with a compact overview of current issues and 
needs in each research field. 

The third part of the original version contains the 68 advisory reports 
commissioned by the German Data Forum. The topics covered in these 
reports span across numerous fields in the social, economic, and behavioral 
sciences: economics, sociology, psychology, educational science, political 
science, geoscience, and communications and media research. Some reports 
focus mainly on substantive issues, some on survey methodology and issues 
of data linkage, some on ethical and legal issues, and others on the assurance 
of quality standards. 

The advisory reports have been sequenced according to categorical 
boundaries. The first reports to be presented begin with concerns regarding 
the future demands likely to be placed on Germany’s research infrastructure 
as well as the progress made since the first KVI report of 2001. One of the 
main topics dealt with here is the harmonization of European research 


infrastructures and possibilities for the permanent institutionalization of cer- 
tain elements thereof. These are followed by reports relating to specific 
research fields, and to new data types and their potential applications in sci- 
entific research — for example, geodata, biodata, and transaction data. Many 
of these reports compactly highlight recent advances in research method- 
ology, such as the use of paradata (“data about data”) and, for example, 
“qualitative methods” that can enrich quantitative data. Others are concerned 
with questions of data security and research ethics. 

Further advisory reports condense the main concerns of specific fields: 
migration and demography; vocational competencies, education, and 
research; labor markets and the economy; the state, the family, and health; 
political and cultural participation; and the role of the media. Since these 
have been identified as crucial research fields for research infrastructure, key 
aspects of each are discussed in numerous executive summaries. 

Most of the authors of advisory reports upon which the recommendations 
presented in this publication are based work in academic or governmental 
organizations in Germany, but important reports also came from private- 
sector experts and from European and US scholars. Because of the wide 
scope of expertise spanning many different fields and issues, this concise 
synopsis is of value not only for policy-makers, research funding bodies, and 
institutional data providers, but indeed for anyone interested in gaining a 
compact overview of Germany’s research infrastructure within its 
international contexts in the social, economic, and behavioral sciences. 

The entire process of preparing both this synopsis and the original com- 
pendium for publication was driven by a sense of enthusiasm, which became 
particularly evident at the workshops and in numerous discussions among 
contributors and German Data Forum members. We are grateful to everyone 
involved in bringing this publication, in addition to the original double- 
volume version, to fruition. 

First of all, we would like to thank the Federal Ministry of Education and 
Research (BMBF, Bundesministerium ftir Bildung und Forschung) for their 
generous funding support for the project “Expanding National and Inter- 
national Research Infrastructure: Progress Since 2001, the Current Situation, 
and Future Needs” (grant number 01 UW 0805). This support provided the 
basis for intensive and systematic critical engagement with the topic of 
research infrastructure for the social, economic, and behavioral sciences, the 
results of which are presented in this publication. 

Our profound gratitude goes to the authors of the advisory reports, who, 
through their comments and suggestions at the two workshops, greatly assis- 
ted in developing a differentiated overview of the current data landscape and 
suggestions regarding its future expansion. Without this crucial input and 
their advisory reports, this publication would not have been possible. 
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Further thanks go to all the members of the German Data Forum 
(RatSWD) for their help in summarizing the findings of the advisory reports 
and in formulating recommendations based on these results. Special thanks 
go to Bruce Headey of Melbourne University, who provided numerous valu- 
able suggestions and was responsible for writing the executive summaries. 

We would like to further express our gratefulness to our publisher, 
Barbara Budrich, who demonstrated her competence as a publisher through 
the whole production process of this publication. 

This publication would never have been possible without the support of 
the German Data Forum (RatSWD) business office, specifically Gabriele 
Rolf-Engel, Patricia Axt, Lena Gond, Toby Carrodus, and Simon Wolff, who 
provided organizational, proofreading, and indexing assistance. Christoph 
Beck monitored the advisory reports and did the final proofreading and 
layout, all with exceptional commitment and careful attention to detail. 

Further special thanks go to Deborah Anne Bowen and Jennifer Dillon 
for the editing of numerous English-language manuscripts and for translating 
several contributions into English. It was a large and sometimes difficult 
project, and they completed it with perseverance, commitment, and analytical 
expertise. 

We are especially grateful to Claudia Oellers for her tireless dedication, 
immense effort, and the overall coordination of “Building on Progress — 
Expanding the Research Infrastructure for the Social, Economic, and Be- 
havioral Sciences.” 

The German Data Forum (RatSWD) adopted these recommendations at 
its 25% meeting on June 25, 2010, in Berlin. 


Berlin, December 2010 


Heike Solga Gert G. Wagner 

Chairperson of the German Data Chairperson of the German Data 
Forum (RatSWD) 2007 — 2008 Forum (RatSWD) 2009 — 2011 
Denis Huschka 


Managing Director of the 
German Data Forum (RatSWD) 
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RECOMMENDATIONS 


for Expanding the Research Infrastructure for the 
Social, Economic, and Behavioral Sciences 


The big picture: Measuring the progress of societies 


The importance of better data for the social, economic, and behavioral sci- 
ences is underscored by recent international developments. For decades, 
social progress was judged mainly by measures of economic performance; 
above all, by increases in gross domestic product (GDP). In 2009, the Com- 
mission on the Measurement of Economic Performance and Social Progress 
(“Stiglitz Commission”)! published its report, which opens with the statement 
that “what we measure affects what we do.” It sought to bring about a change 
in social and political priorities by advocating that greater emphasis be 
placed on measures of well-being and of environmental and economic sus- 
tainability. 

The Stiglitz Commission’s recommendations form a backdrop to this re- 
port.” Recommendation 6 in particular can serve as a unifying theme for our 
recommendations; we quote it below in full. 


Both objective and subjective dimensions of well-being are important 


“Quality of life depends on people’s objective conditions and capabilities. Steps should be 
taken to improve measures of people’s health, education, personal activities and environ- 
mental conditions. In particular, substantial effort should be devoted to developing and 
implementing robust, reliable measures of social connections, political voice, and insecu- 
rity that can be shown to predict life satisfaction.” 


In Germany, the Statistical Advisory Committee (Statistischer Beirat), which 
advises the Federal Statistical Office, made the Stiglitz Commission’s report 
the backbone of its recommendations for the next few years. The Committee 
writes: 


“Initiatives for the further development of national statistical programs — above all de- 
mands for new data — often come from supra- and international institutions: the EU Com- 


1 Report by the Commission on the Measurement of Economic Performance and Social Prog- 
ress, chaired by Joseph E. Stiglitz, Amartya Sen and Jean-Paul Fitoussi, http://www. 
stiglitz-sen-fitoussi.fr, and Stiglitz, J./Sen, A. and Fitoussi, J.-P. (2010): Mismeasuring Our 
Lives: Why GDP Doesn’t Add Up. New York. 

2 International organizations like the Organisation for Economic Co-operation and Devel- 
opment (OECD) are dealing with similar issues. For example OECD established the 
“Global Initiative on Data and Research Infrastructure for the Social Sciences (Global Data 
Initiative)” as part of its “Global Science Forum.” 


13 


mission, the European Central Bank, the UN, OECD and the IMF. The Statistical Advisory 
Committee (Statistischer Beirat) believes that valuable key initiatives will come from the 
Stiglitz Commission and the theme Beyond GDP advanced by the European Commission. 
Official statistics, in cooperation with the scientific community, must react to these initia- 
tives and their system of reporting must develop accordingly.” 


We want to stress this point in particular: Beyond GDP will be a fruitful con- 
cept only if it is discussed and shaped collaboratively by government statis- 
tical agencies and academic scholars. As the Statistical Advisory Committee 
wrote: 


“The Federal Statistical Office should take stock of the non-official data which may be 
available with a view to measuring the multi-dimensional phenomenon of quality of life. 
The development of statistical indicators should be undertaken in cooperation with the 
scientific community.” 


Further, at the 12" German-French Council of Ministers in February 2010, 
President Sarkozy and Chancellor Merkel agreed on the Agenda 2020, which 
included joint work on new measures of social progress. This again was a 
clear message that policy-makers are interested now more than ever in sound 
empirical evidence about a wide range of social and economic trends indica- 
tive of human progress or regress. 

The following principles and themes are not intended to contribute di- 
rectly to discussion of the Stiglitz Commission report or the initiative of the 
German-French Council of Ministers. But they do lay the groundwork for 
improved measurement of economic performance and social progress. 

We strongly believe that recent improvements in survey methods and 
methods of data analysis hold promise of contributing substantially to im- 
proved measurement of social progress. 


Background 


This report is based on contributions by approximately one hundred social 
scientists? who were invited by the German Data Forum (RatSWD) to write 
advisory reports on key research issues and future infrastructure needs within 
their areas of expertise; their reports are published as part of this publication.‘ 


3 To avoid long-winded expressions, the term social sciences will be used in the remainder of 
this report to refer to all the behavioral, economic, educational, and social sciences, as well 
as related disciplines. 

4 Some working papers that were not commissioned by the German Data Forum but that are 
of interest too are available on the homepage of the German Data Forum. See http://www. 
ratswd.de/eng/publ/workingpapers.html, especially Working Papers 50, 52, 79, 113, 131, 
135, 137, 139, 141, 151, and 153. 
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The number of experts who have contributed is even larger than it was when 
the predecessor of this report was published in 2001.5 

The advisory reports cover a wide range of fields of the behavioral, eco- 
nomic, and social sciences: sub-fields of economics, sociology, psychology, 
educational science, political science, geoscience, communications, and 
media research. Some reports focus mainly on substantive issues, some on 
survey methodology and issues of data linkage, some on ethical and legal 
issues, some on quality standards. Most contributors work for German aca- 
demic or governmental organizations, but important reports were also re- 
ceived from individuals in the private sector and from European and Ameri- 
can academics. All had a focus on German infrastructural needs, but German 
as well as international contributors emphasized the importance of interna- 
tional collaborative and comparative research. All reports have been repeat- 
edly peer reviewed; they have been discussed and amended at successive 
meetings and in working groups organized by the German Data Forum 
(RatSWD). 

We first set out some guiding principles underlying the recommenda- 
tions. The core of the recommendations is structured around a set of prin- 
ciples and specific recommendations regarding infrastructure for the social 
sciences. 

Research in the fields of public health and social medicine is not re- 
viewed. These are clearly such important and distinct fields that they require 
their own major reviews. 


Principles guiding the recommendations 


Evidence-based research to address the major issues confronting humankind 


The social sciences can and should provide evidence-based research to ad- 
dress many of the major issues confronting humankind: for example, tur- 
bulent financial markets, climate change, population growth, water shortages, 
AIDS, and poverty. In addressing some of these issues, social scientists in 
Germany need to cooperate with physical and biological scientists, with 
scholars in the humanities, and also with the international community of 
scientists and social scientists. 


5 Kommission zur Verbesserung der informationellen Infrastruktur zwischen Wissenschaft 
und Statistik (KVI) (Ed.) (2001): Wege zu einer besseren informationellen Infrastruktur. 
Baden-Baden. For an English translation of the recommendations, see: “Towards an 
Improved Statistical Infrastructure — Summary Report of the Commission set up by the 
Federal Ministry of Education and Research (Germany) to Improve the Statistical 
Infrastructure in Cooperation with the Scientific Community and Official Statistics." 
Schmollers Jahrbuch, 121 (3), 443-468. 
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Competition and research entrepreneurs 


In making recommendations about the future of research funding and re- 
search infrastructure, we recognize the importance of competition and re- 
search entrepreneurs. This may seem an unusual perspective. In many coun- 
tries, including Germany, there is a tradition of centralizing research funding 
and infrastructure decisions. In our view, this is suboptimal. Science and the 
social sciences thrive on competition — competition of theory and ideas, and 
competition of methods, and competition of infrastructures. 

Public funding of research infrastructure is certainly needed because re- 
search findings and research infrastructure are public goods and would be 
undersupplied in a free market.® But decisions should not be made in a cen- 
tralized, top-down fashion — an approach that has the effect of stifling rather 
than promoting innovation. The experience of the last few years has demon- 
strated — notably in the field of empirical educational research — that many 
fruitful new ideas and initiatives can emerge from a decentralized structure 
that would almost certainly never have resulted from a “master plan.” First of 
all, in Germany the National Educational Panel Study (NEPS) and the Panel 
Analysis of Intimate Relationships and Family Dynamics (pairfam) are wor- 
thy of mention. Both are new panel studies with a long time horizon. 

The history of Germany’s Research Data Centers and Data Service 
Centers illustrates the same point. All the Research Data Centers and Data 
Service Centers established in the last six years were the result of independ- 
ent initiatives intended to meet distinctive research needs. The KVI laid the 
groundwork by initiating the establishment of the first six Research Data 
Centers through central funding. All the later centers were bottom-up devel- 
opments. The Federal Ministry of Education and Research (BMBF, Bundes- 
ministerium ftir Bildung und Forschung) or other departments provided some 
project funding for a few centers. What was crucial was the basic concept for 
the Research Data Centers, and that was developed by the KVI in its 2001 
report. 

It is true that the German Data Forum (RatSWD) later institutionalized 
this framework by establishing a Standing Committee of the Research Data 
Centers and Data Service Centers (Ständiger Ausschuss Forschungsdaten- 
Infrastruktur des RatSWD). This committee helps the centers to work to- 
gether and put forward common interests, but it does not initiate new centers. 
Indeed, we believe that the German Data Forum (RatSWD) should not do so. 
What is necessary is a common framework for new initiatives that aim to 
raise Germany’s social science infrastructure to a higher level. 


6 See also UK Data Forum (2009): UK Strategy for Data Resources for Social and Economic 
Research. RatSWD Working Paper No. 131. 


16 


In this report we take some further steps towards developing a common 
framework for research infrastructure in the social sciences. In doing so, we 
bear in mind the increasing opportunities open to German researchers to 
contribute to European and international databases and projects, as well as to 
projects in Germany itself. We formulate some principles and highlight a 
range of concepts and ideas drawn from the advisory reports. 

We do not make detailed recommendations about specific research fields 
or particular infrastructural facilities. This would run counter to our view that 
innovative research directions and new ideas develop mainly at the grass- 
roots of scientific and statistical communities. The advisory reports did in- 
clude a large number of recommendations for promoting research in specific 
fields and on specific issues. A few of these recommendations are included in 
this report as examples, but in general our approach is to make recommenda- 
tions about institutions and processes in which competition and research 
entrepreneurship can flourish. Nevertheless, by providing the advisory re- 
ports in this publication, we hope to give research funding bodies some idea 
about the budgets that may be needed if particular ideas are put forward by 
“scientific entrepreneurs.” 


The important role of younger researchers 


Closely connected to the need for competition and innovation in science is 
the need to develop and foster excellent young researchers and ensure that 
they have sufficient influence in the research community for their ideas and 
research skills to flourish. It is, in general, true that a centralized research 
environment favors older, well-established researchers. Almost unavoidably, 
it is they who are appointed to the main decision-making positions. However 
eminent they are, their decisions may tend to favor well-established research 
topics and well-established methods. Innovation, on the other hand, is more 
likely to come from younger and mid-career researchers. 

An important aim and principle underlying this report is to enhance the 
roles, influence, and opportunities of younger and mid-career researchers. 
They should be encouraged and given incentives to act as research entrepre- 
neurs, competing to attract funding, develop infrastructure, conduct research, 
and disseminate new hypotheses and findings. They may, however, have 
occasion to form research networks among themselves, and this should be 
supported.’ 

The need to encourage younger researchers is particularly clear in the of- 
ficial statistical offices. They need more freedom to improve official statistics 
by doing research. Further, with more research opportunities available, em- 


7 See the editorial in Science, April 2, 2010, Vol. 328, 17, and letters in Science, August 6, 
2010, Vol. 329, 626-627. 
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ployment in official statistical offices will become more attractive to innova- 
tive post-doctoral researchers. Recommendations along these lines are devel- 
oped under Theme 2 below, where we also suggest that it would be valuable 
to form new kinds of partnerships with private-sector data collection agen- 
cies for the performance of specific infrastructure tasks. 


Social science requires improved theory and methods, not just more data 


The main focus of this report is necessarily on research infrastructure and 
databases, but we want to highlight explicitly the importance of further im- 
provements in social science theory and also in statistical and survey meth- 
ods. 

Social scientists in almost all fields complain about data deficiencies. 
The usually unstated assumption is that if only they had the right data, they 
could do the rest. This is self-serving and misleading. Theory and method are 
also crucial, and new developments in these domains often go hand in hand 
with availability of new data sources. The advisory reports published in 
Part III of this compendium describe exciting new data sources available to 
social scientists, including data arising from “digitization,” geo-referencing, 
and bio-medical tests. We make some recommendations about linkages be- 
tween new and increasingly available data sources and potential improve- 
ments to social science theory and method. 


Research ethics and data protection are of growing importance 


Most data in the social sciences are of course data on human subjects. This 
means that principles of research ethics and privacy need to be observed. In 
Germany the right to privacy is enshrined in the Federal Data Protection Act 
(BDSG, Bundesdatenschutzgesetz), which protects individuals against the 
release of any information about their personal or material circumstances that 
could be used to identify them. Principles of research ethics, on the other 
hand, are not embodied in law but are dealt with by the scientific community 
through codes of ethics promulgated by their professional associations. 

Due to new technological developments, data protection and research 
ethics are of growing importance. Two of the themes outlined below reflect 
this importance. 
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Specific recommendations 


In this section, we summarize insights arising from the advisory reports and 
subsequent discussions within the German Data Forum (RatSWD). We do 
this by presenting ten themes. Most of them represent general ideas and fairly 
abstract recommendations. We aim to encourage debate in the scientific and 
policy-making communities. 


Theme 1: Building on success: Cooperation between official statistics 
and academic researchers 


The German Data Forum’s (RatSWD) current activities, as well as the pre- 
sent compendium, build on substantial achievements flowing from the 2001 
KVI report. A major theme of that report was the need for improved cooper- 
ation between academics and the official statistical agencies, particularly in 
regard to making official datasets available for academic research. Initially, 
four Research Data Centers and two Data Service Centers were set up to 
provide academics and other users with access to official data files and with 
training and advice on how to use them. The original Research Data Centers 
are associated with the Federal Statistical Office, the Statistical Offices of the 
German Ldnder, the Institute for Employment Research (IAB, Institut fiir 
Arbeitsmarkt- und Berufsforschung) of the Federal Employment Agency 
(BA, Bundesagentur fiir Arbeit), and the German Pension Insurance (RV, 
Deutsche Rentenversicherung). Since then, nine more Research Data Centers 
have been founded (June 2010) and, after being reviewed by the German 
Data Forum (RatSWD), they joined the group of certified Research Data 
Centers. It is also worth noting that, after their first three years, all the origi- 
nal Research Data Centers and Data Service Centers were formally reviewed 
and received positive evaluations. 

One of the advisory reports provided for this review offered the observa- 
tion that, as a result of the Research Data Centers, Germany went from the 
bottom to the top of the European league as an innovator in enabling scien- 
tific use of official data. It has also been suggested that the Research Data 
Centers have had benefits that were not entirely foreseen, in that civil serv- 
ants and policy advisors are increasingly using research-based data from 
Research Data Centers to evaluate existing policy programs and plan future 
programs. Civil servants have more confidence in academic research findings 
knowing that they are based on high-quality official data sources and that the 
researchers have received advice on how to use and interpret the data. 

Official data files have also become more readily available for teaching 
in the higher education sector as a result of the recommendations of the 2001 
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KVI report. CAMPUS-Files, based on the Research Data Center files, have 
been created for teaching purposes and are widely used around the country. 

It is important to note that the Research Data Centers have made good 
progress in dealing with a range of privacy and data linkage concerns that 
loomed large ten years ago. Particular progress has been made in linking em- 
ployer and employee data. Research Data Centers have also, in some cases, 
been able to develop procedures for enabling researchers to have remote 
access to data once they have worked with officials in the relevant agencies 
and gained experience in using the data. 

Partly due to the progress already made, but mainly due to technological 
and inter-disciplinary advances, new and more complicated issues relating to 
data protection, privacy, and research ethics keep arising. Some of these 
issues emerge because of the increasing availability of types of data that most 
social scientists are not accustomed to handling, including biodata and geo- 
data. Other issues emerge due to the rapidly increasing sophistication of 
methods of record-linkage and statistical matching. These issues are dis- 
cussed in more detail under Theme 8 (“Privacy”) and Theme 9 (“Ethical 
Issues”). 

Based on these considerations, it is recommended that work continues 
towards providing a permanent institutional guarantee for the existing Re- 
search Data Centers. In the best-case scenario, Research Data Centers that 
belong to the statistical offices and similar institutions should be regulated by 
law. At present, the costs of Research Data Centers are borne by the agencies 
that host them, and users are usually not required to pay more than a nominal 
fee. We believe that this is the best way to run the centers because it ensures 
maximum use of official data. In the event that funding issues arise in public 
and policy discussions, it is recommended that cost-sharing and user-pays 
models be investigated. 

It is recommended that methods of obtaining access to a number of im- 
portant databases that are still de facto inaccessible to researchers be investi- 
gated. Examples include criminal statistics and data on young men collected 
through the military draft system. 

In particular, it is recommended that methods of permitting remote data 
access to Research Data Center files continue to be investigated. 

It is recommended that the microdata of the 2011 Census — the first 
Census in almost 30 years — should be accessible and analyzed in-depth by 
means of concerted efforts on the part of the scientific community and fund- 
ing agencies for academic research. 

It is recommended that peer review processes be established and suffi- 
cient resources allocated to provide “total quality management” also of the 
data produced by government research institutes (Ressortforschungseinrich- 
tungen). 
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We are in favor of a coordinated and streamlined process. We take a 
critical view, however, of the current trend towards increasing numbers of 
evaluations: this is neither efficient nor beneficial to the scientific content. 

It is recommended that data providers in Germany collaborate more 
closely with the European Union’s statistical agency, Eurostat. 


Theme 2: Inter-sector cooperation: cooperation between academic 
research, the government sector, and the private sector 


A major theme of the 2001 KVI report was the need for greater cooperation 
and collaboration among academic social scientists, official statistical agen- 
cies, and government research institutes (Ressortforschungseinrichtungen). 
Since then, it has become clear that in many areas of data collection and 
analysis, official institutes and academic organizations can form effective 
partnerships. Such partnerships would be strengthened if younger researchers 
in all areas were permitted more independent roles. 

Much remains to be done. Academic research teams and official statisti- 
cal agencies and research institutes probably still do not always realize how 
much they have to gain from collaboration. But each side must pay a price. 

Academics need to understand and respect the social, political, and ac- 
countability environments in which official agencies operate. The official 
agencies (including the ministries and parliaments behind them), for their 
part, need to be willing to give up monopoly roles in deciding what specific 
data to collect and disseminate. 

A strong case can be made that the improved level of cooperation that 
has been seen in recent years between academic social scientists and official 
statistical agencies and authorities should now be extended to include the 
private sector as well. Many large social and economic datasets, especially 
surveys, are collected by private-sector agencies. Since these agencies oper- 
ate in a competitive market, they need a reasonably steady and secure flow of 
work in order to be able to make the investments required to maintain high- 
quality standards in data collection and documentation. Public-private part- 
nerships may be desirable for initiating, attracting funding for, and continu- 
ing long-term survey-based projects. The UK’s Survey Resources Network 
has experience in these ventures and may be able to offer useful guidance. 
Last but not least, a permanent flow of sufficient amounts of work is neces- 
sary to ensure competition between private fieldwork firms. 

There are many opportunities for methodological investigations carried 
out in cooperation among academics and government and private-sector sur- 
vey agencies. One clear example is investigation of the advantages, disad- 
vantages, and possible biases of mixed-mode surveys. Mixed-mode surveys, 
which are more and more widely used, involve collecting data using a variety 
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of methods, for example, personal interviews, telephone, mail, and Internet. 
In practice, respondents are commonly offered a choice of method, and the 
choice they make may affect the evidence they report. 

Leaving aside cooperative ventures with public sector and academic cli- 
ents, it is clear that private sector fieldwork agencies already collect a vast 
amount of market research data of great potential value to academic research- 
ers. 

The potential of market research data for secondary analysis lies mostly 
in the fields of consumption patterns and media usage. The German market 
research industry is huge — it has an annual turnover of more than two billion 
euros — and over 90 percent of its research is quantitative. However, samples 
are often highly specialized; telephone interviewing is the most common 
mode of data collection; and data documentation standards are not as high as 
academic social scientists would wish. However, secondary data analyses 
seem to be worthwhile — last but not least as a kind of quality control for 
these data. Clearly, too, the commercial clients for whom data are collected 
would have to give permission for secondary analysis. The data would have 
to be anonymized not only to protect individuals, but also to protect commer- 
cially sensitive information about products. 

In addition, transaction data (e.g., about purchasing behavior) that is 
generated by commercial firms can be of interest for scientific research. In 
this case, anonymization is extremely important. The German Data Forum 
(RatSWD) makes no specific recommendation about this issue beyond the 
view that recognition of market research data and transaction data merits 
consideration in the scientific and statistical communities. 


Theme 3: The international dimension 


The main focus of the detailed advisory reports contained in this publication 
is of course on German social science infrastructure and research needs, but 
the international dimension is critical too. Plainly, many of the problems with 
which social scientists as well as policy-makers deal transcend national bor- 
ders; for example, turbulence in financial markets, climate change, and 
movements of immigrants and refugees. Furthermore, international compara- 
tive research is an important method of learning. Similar countries face sim- 
ilar issues, but have developed diverse and more or less satisfactory policy 
responses. To do valuable international comparative research, researchers 
usually need to work with skilled foreign colleagues. 

International data collected by the EU and other supra-national organiza- 
tions have important strengths but also important limitations. The data are at 
least partly “harmonized” and cross-nationally comparable. Generally, how- 
ever, data coverage is restricted to policy fields for which international or- 
ganizations have substantial responsibility. Data are much sparser in areas 
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that are still mainly a national-level responsibility. Furthermore, the needs of 
policy-makers, for whom the data are collected, do not exactly match the 
needs of scientists. 

For example, policy-makers require up-to-date information, whereas sci- 
entists give higher priority to accuracy. Policy-makers are often satisfied with 
use of administrative and aggregate data and accept “output harmonization,” 
whereas scientists favor the collection of micro-level survey data and prefer 
“input harmonization,” that is, data collection instruments that are the same 
in each country. 

With regard to international cooperation, which still raises some difficult 
problems for German researchers — in part because of legal restrictions on 
data sharing — we recommend that a working group be set up by the German 
Data Forum (RatSWD) to find ways of making German official statistics 
available as anonymized microdata to reliable foreign research institutes. 

There are several cooperative European ventures that will be discussed in 
an open and constructive manner. These include a new European household 
panel survey under academic direction, Europe-wide studies of birth and 
other age cohorts, and a Europe-wide longitudinal study of firms. It would 
also be of great benefit to comparative European research if access to micro- 
level datasets held by Eurostat could be improved. Ideally, these data would 
be made available by virtual remote access, with appropriate safeguards to 
ensure data security. 

It is noted that, following a British initiative, an International Data 
Forum (IDF) has been proposed. Along the lines of the UK Data Forum and 
the German Data Forum (RatSWD), this body would aim to bring together 
academic researchers and official statistical institutes, including international 
organizations like the OECD. The plan is currently being developed via an 
Expert Group set up under the auspices of the OECD. It is recommended that 
Germany participate in this and related initiatives through the German Data 
Forum (RatSWD) and possibly other bodies. 

Finally, it is clear that the academic data providers are not very well or- 
ganized at the international and supra-national level. Notable exceptions are 
international survey programs like the European Social Survey (ESS) and the 
Survey of Health, Ageing and Retirement in Europe (SHARE), and networks 
of archives like the Council of European Social Science Data Archives 
(CESSDA), “Data Without Boundaries,” and the “Committee on Data for 
Science and Technology (CODATA).” We recommend that the academic 
sector consider setting up an independent organization to represent its inter- 
ests at the European and worldwide levels. This academic organization 
would be one of the partners in the international bodies that are likely to be 
established following the OECD initiative. 
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Theme 4: Data on organizations and “contexts” 


It is clear that, since the 2001 KVI report, a great deal of progress has been 
made in Germany to improve academic researchers’ access to firm-level data 
— that is, to data on employers and employees. These are high-quality data 
mainly collected in official surveys; firms are required to respond and to 
provide accurate information about the firm and its employee structure. Most 
statistical data of this kind are now available from Research Data Centers. 
Progress has been made on issues of data linkage, while protecting confiden- 
tiality, with the result that it is now often possible for researchers to link data 
from successive official surveys of the same firm. It is not, however, at pre- 
sent legally possible to link surveys of German firms to international da- 
tasets. This would be a desirable development, given that many firms now 
have global reach. 

Progress made in improving access to data on business organizations 
points the way towards what needs to be achieved in relation to the many 
other organizations and contexts in which people live and work. Individual 
citizens are typically linked to multiple organizations: firms, schools, univer- 
sities, hospitals, and of course their households. Linking data on these 
organizations and contexts with survey data on individuals would be desir- 
able. Yet technical problems concerning algorithms for linking data are cer- 
tainly easier to solve than the important questions regarding research ethics 
and data confidentiality that are in need of discussion. 

At present, then, there are no German datasets that have adequate statis- 
tical information on all the organizations in which individuals operate. Data 
thus need to be collected in surveys on persons and activities in multiple 
organizations, and where possible, linked to data about the organizations 
themselves. This could potentially be achieved by (1) adding additional 
questions about organizational roles to existing large-scale surveys, perhaps 
including the large sample of the German Microcensus, as well as by (2) 
linking existing survey datasets on these organizations with Microcensus data 
and other surveys on individuals and households. 

A very special kind of new data type is information about historical con- 
texts, which can be linked to time series data or microdata with a longitudinal 
dimension. The European Social Survey (ESS), for instance, provides such a 
databank. It contains information on small and large historical events, and is 
updated on a daily basis. It is worthwhile to think about offering such a cen- 
tralized historical database to the community at large. 

Government and research-based statistical data on political and civil so- 
ciety organizations are in short supply in Germany. In many Western coun- 
tries, evidence about political parties — the most important type of political 
organization — is regularly obtained from national election surveys. Election 
surveys are also the main source of evidence on mass political participation. 
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We want to note that in Germany, there is no guaranteed funding for election 
surveys, although a major election project (GLES, German Longitudinal 
Election Study) is currently being undertaken. This project could develop 
into a national election study. 

Several of the advisory reports prepared for the German Data Forum 
(RatSWD) discussed detailed practical ways of realizing these possibilities. 
The RatSWD recommends that funding agencies consult these advisory re- 
ports when assessing specific applications to conduct organizational research. 


Theme 5: Making fuller use of existing large-scale datasets by adding 
special innovation modules and “related studies” 


Many of the advisory reports recommended that fuller use could be made of 
existing large-scale German datasets (such as ALLBUS) by adding special 
innovation modules, thereby creating greater value for money. Suggestions 
were made both for special samples and for special types of data to be col- 
lected. In all cases, it was suggested that the particular benefit of adding 
modules was that the underlying survey could serve as a national benchmark 
or reference dataset against which the new, more specialized data could be 
assessed. 

The availability of a reference dataset enables researchers to obtain a 
more contextualized understanding of the attitudes and behaviors of specific 
groups. Conversely, the availability of detailed and in-depth evidence about 
subsets of the population can strengthen the causal inferences that analysts of 
the main reference dataset are able to make. 

The advisory reports covering international and internal migration docu- 
ment substantial data deficits, which, it is suggested, could be largely over- 
come by adding special modules to existing longitudinal surveys (such as the 
SOEP). It has been pointed out that existing datasets do not allow researchers 
to track the life-cycles of migrants over long periods. This is particularly a 
problem in relation to highly skilled migrants, a group of special interest to 
policy-makers. Migrant booster samples, added to existing large-scale sur- 
veys, would largely overcome the problem. 

Reports written by experts in other fields made similar recommendations. 
For example, it was suggested that data deficits relating to pre-school educa- 
tion and vocational education and competencies could be partly overcome by 
adding short questionnaire modules to ongoing surveys. 

It is more or less conventional in the social sciences to collect explora- 
tory qualitative data — for example, open-ended interviews — to develop hy- 
potheses and lay the basis for quantitative measures prior to embarking on a 
large-scale quantitative project. It is suggested that this sequence can also 
sensibly be reversed. Once a quantitative study has been analyzed, individu- 
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als or groups that are “typical” of certain subsets can be approached with a 
view to conducting qualitative case studies. The researcher then knows pre- 
cisely what he/she has a “case of.” Extended or in-depth interviews can then 
be undertaken to understand the decisions and actions that subjects have 
taken at particular junctures in their lives, and the values and attitudes un- 
derlying their decisions.® 

In an advisory report it is proposed that innovation modules using “expe- 
rience sampling methods” be added to existing large-scale surveys. Again, 
the procedure would be to approach purposively selected respondents, repre- 
senting sub-sets of the main sample, and ask them to record their answers to 
a brief set of questions (e.g., about their current activities and moods) when a 
beeper alerts them to do so. 


Theme 6: Openness to new data sources and methods 


Advisory reports prepared for the German Data Forum (RatSWD) high- 
lighted the potential of several exciting new sources and methods of collec- 
ting data. We want to mention some of these sources, but without making 
specific funding recommendations. We do, however, want to stress that 
Germany needs to develop funding schemes that are receptive to inter-disci- 
plinary research proposals involving use of these new data sources and data 
collection methods. 


Digitization 


Survey data and publications in the social sciences have generally been 
available in digital form for some time. Thanks to the grid technology pro- 
moted by the Federal Ministry of Education and Research (BMBF) as part of 
the D-Grid Initiative, it is now possible to work with these digital data on a 
much larger scale and — more crucially — in new research contexts, thus ena- 
bling completely new approaches in empirical research. Yet the possibilities 
offered by grid technology have not been exploited in the social sciences to 
any notable extent. 

Large quantities of data that would be of interest in social sciences re- 
search are generated by the Internet (particularly online social networks) and 
by the use of mobile phone, GPS, and RFID technologies. To date, research- 
ers have drawn little benefit from such data, as numerous questions con- 
cerning access and data confidentiality remain unclarified. A few initiatives 
have been undertaken. For example, the networking site Facebook reports 


8 It is important to address the privacy and ethical implications of approaching survey 
respondents for additional interview data. Clearly, the respondents must be asked for 
explicit consent to link the data sets. 
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that social scientists in all English-speaking countries are analyzing messages 
posted on the site each day to assess changes in moods and perhaps happi- 
ness levels. 

However, it will not be possible to make substantial progress until access 
and privacy issues are resolved. The German Data Forum (RatSWD) notes 
that the UK’s Economic and Social Research Council (ESRC) has set up an 
Administrative Data Liaison Service to deal with these issues by linking 
academics to producers of administrative data. 


Geodata — A multifaceted challenge 


Most of the data used in the social sciences have a precise location in both 
space and time. While geodata are used widely in geography and spatial 
planning, this is generally not the case in the social sciences. Spatial data 
from various sources (e.g., concerning urban development or the weather) 
can readily be combined via the georeferences of the units under investiga- 
tion. This makes georeferenced data a valuable resource both for research 
and for policy advice and evaluation. While administrative spatial base data 
have been widely available for Germany for a long time, there has been an 
enormous increase in recent years in the supply of spatial data collected by 
user communities (e.g., OpenStreetMap) and private data providers (e.g., 
Google Street View). Furthermore, remote sensing data (aerial photos or 
satellite data) have become more important. These data are provided by dif- 
ferent sources, which makes it important to launch geodata infrastructure 
projects that bring together different geodata sets. It must be emphasized that 
data security is of high importance for this type of data; issues of personal 
rights are particularly sensitive. 

Closely related to geodata are data for regions, which can be defined as 
areas as large as a German Land or as small as a municipality. Regional data 
have been available for many years and have been used for cross-regional 
investigations and as context variables in studies investigating the behavior 
of persons or firms. Access to many datasets at various levels of regional 
aggregation is straightforward in Germany through the use of cheap 
CDs/DVDs and the Web.? The main challenge is to offer access to geodata in 
ways that allow easy combination with other data. Both current and older 
data need to be made available to allow for longitudinal studies. Furthermore, 
data on individuals, households, and buildings should be entered with a di- 
rect spatial reference; this is especially important for the forthcoming 2011 
Census. 

An important recommendation for the future is to intensify collaboration 
between social science researchers and researchers in institutions in the cur- 


9  http://www.geoportal.bund.de, http://www.raumbeobachtung.de, http://www.regionalstatis 
tik.de. [Accessed on: August 7, 2010]. 
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rently rather segregated areas of geoinformation and information infrastruc- 
ture. Thus, the German Data Forum (RatSWD) will set up a working group 
on geodata and regional data with a view to bringing the different data pro- 
viders and users together. 


Biodata — Research incorporating the effects of biological and genetic factors 
on social outcomes 


In recent times, greater attention has been paid in the social sciences to bio- 
medical variables, including genetic variables that influence social and eco- 
nomic behaviors. Many opportunities, and some serious risks, exist in this 
growing research field. Historically, social scientists have received no train- 
ing in biomedical research and are unlikely to be aware of the possibilities. 
Certainly, they have little knowledge of appropriate methods of data collec- 
tion and analysis. It is under discussion whether the German Data Forum 
(RatSWD) will set up a working group with a view to positioning German 
social scientists to be at the forefront of developments. The group would 
need to include biologists and medical scientists, as well as social scientists 
and — equally important — not only data protection specialists but also ethics 
specialists. In addition, one issue that such a working group would have to 
address is the difficulty that researchers who are working at the interface of 
the social and biomedical sciences currently have in attracting funding. 

A role model for this kind of multidisciplinary data collection may be 
found in the SHARE study, which has already conducted several pilot stud- 
ies, collecting biomedical data from sub-sets of its European-wide sample. It 
has been shown that, with adequate briefing, medically untrained interview- 
ers can do a good job of getting high-quality data in biomedical surveys, 
without a significant increase in non-participation or drop-out rates. 


Virtual worlds for macro-social experiments 


Advocates of the use of computer-generated “virtual worlds” (such as 
“Second Life”) for social science research believe that they offer the best 
vehicle for developing and testing theories at a “macro-societal” level. Many 
of the problems facing humanity are international or threaten whole societies: 
climate change, nuclear weapons, water shortages, and unstable financial 
markets, to name just a few. By setting up virtual worlds with humans repre- 
sented by avatars, it is possible to conduct controlled experiments dealing 
with problems on this scale. The experiments can be run for long periods, 
like panel studies, and they can allow for the involvement of unlimited num- 
bers of players. They pose no serious risk to players and avoid the ethical 
issues that limit many experiments that simulate real situations. 
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Advocates of macro-social experiments recognize that initial costs are 
high, but claim that the worlds they create hold the prospect of eventually 
being self-funding, paid for by the players themselves. 


Theme 7: Data quality and quality management 


An increasingly important role is being played by questions related to the 
quality of (1) available measurement instruments, and (2) documentation 
required to facilitate secondary analysis of existing datasets. 

Experts in several areas in their advisory reports made the point that a 
fairly wide range of measurement instruments were available to them, but 
that researchers would benefit from guidance in assessing their comparative 
reliability, validity, and practicality in fieldwork situations. In the advisory 
reports, it was suggested that something like a central clearing house was 
needed with a mandate to assess and improve standards of measurement. It 
was noted that the recent founding of the Institute for Educational Progress 
(IOB, Institut zur Qualitätsentwicklung im Bildungswesen) could serve as a 
model for additional subfields. 

The Institute was launched at a time when the poor performance of Ger- 
man students in standardized international tests led to increased concern with 
measuring learning outcomes. The IQB is measuring the performance of 
representative samples of students in the 16 German Länder, and will also be 
available to serve as a source of advice on measurement issues 

A related but somewhat separate concern mentioned in several advisory 
reports is the poor quality of documentation provided for many surveys and 
other datasets that, in principle, are available for secondary analysis. It ap- 
pears that academic data collection has much to learn in this respect from 
official statistical agencies, which generally adhere to high standards in data 
collection and documentation. 

In thinking about data storage and documentation, a distinction should 
probably be drawn between two types of academic projects: those that are of 
interest only to a small group of researchers and those that are of wider inter- 
est. A mode of self-archiving (self-documentation) should suffice for the 
former type, although even here minimum satisfactory uniform standards 
need to be established. The latter type should be required to meet high pro- 
fessional standards of documentation and archiving (see Theme 10). 

To a large extent, improvement of survey data documentation is a matter 
of adopting high metadata standards. These are standards relating to the 
accurate description of surveys and other large-scale datasets that need to be 
met when data are archived. Historically, researchers paid little attention to 
the quality of metadata surrounding their work; archiving was left to archi- 
vists. This mind-set is changing. There have been rapid advances in the de- 
velopment and implementation of high-quality metadata standards, standards 
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which apply to datasets throughout their life cycle from initial collection 
through to secondary use. 

An important source of survey metadata is the information collected in 
the recruitment of survey participants and in the actual survey itself con- 
cerning survey methods, the administration of the survey, and, when applica- 
ble, geographic location. These data, sometimes termed paradata, are typ- 
ically recorded by interviewers and stored at the surveying institute. The data 
are valuable for analyzing problems of survey non-response and for assessing 
the advantages and disadvantages of different data collection modes. Par- 
adata can be used for “continuous quality improvement” in survey research. 
It is recommended that efforts be made to standardize and improve the qual- 
ity of paradata collected by public and private-sector survey agencies. The 
European Statistical System has published a handbook on enhancing data 
quality through effective use of paradata. 

In Germany, the Research Data Centers have taken the lead in trying to 
improve current standards of documentation. Based on their experience, it 
appears that there are two internationally acceptable sets of metadata stand- 
ards — the Data Documentation Initiative (DDI) and the Statistical Data and 
Metadata Exchange (SDMX) Standard — which could be more widely used in 
Germany. Adoption of these standards requires the establishment of a IT 
infrastructure compatible with the industry standard for Web services. This 
infrastructure can then facilitate the management, exchange, harmonization, 
and re-use of data and metadata. 

We would like to highlight in particular one potential means of improv- 
ing documentation: the use of a unique identifier for datasets (e.g., a digital 
object identifier or DON). Unique identifiers for particular measurement 
scales (e.g., the different versions of the “Big Five” inventory) could possibly 
also be helpful (see also Theme 10 below). 

The need for high-quality metadata appears even more pressing when re- 
calling that many Internet users who are not themselves scholars are making 
increased use of these data for their own analyses. Results generated by lay 
users are especially likely to be skewed or misleading if the strengths and 
limitations of the data are described inadequately or in jargon a layperson 
could not be expected to understand. 


Theme 8: Privacy issues 


This section deals with privacy issues, particularly those that arise due to 
increasingly sophisticated methods of data linkage. Record linkage refers to 
the possibility of linking up different datasets containing information about 
the same units (e.g., individuals or firms). Linkages may be made, for exam- 
ple, between different surveys or between survey data and administrative 
data. Normally, datasets can only be linked if a common identifier is avail- 
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able. However, linkage can sometimes now be achieved by means of 
“statistical matching” when datasets do not contain the same identifiers for 
particular individuals. 

When an individual or firm consents to take part in a specific research 
project, her commitment — and the limits of that commitment — are usually 
reasonably clear. But what is the situation if researchers then link a file ob- 
tained for this specific project to other files about the respondent, which, for 
example, contain information about her employer, tax files, health, or precise 
geographical location? Clearly, such linked data are of immense value to 
researchers, both in conducting basic scientific research and in providing 
policy advice. While it is clear that such linking may only take place with the 
explicit consent of the concerned individuals, how “explicit” must this con- 
sent be? Do the individuals whose data are being linked need to provide 
specific consent prior to each new linkage? 

The advisory reports written for the German Data Forum (RatSWD) ex- 
press a wide variety of views on this matter. While some legal experts have 
described such data linking as a breach of law, we believe that these prob- 
lems could be best resolved by passing legislation that would require re- 
searchers to observe a principle of “research confidentiality” (Forschungs- 
datengeheimnis). This legislation, which was recommended by the KVI in 
2001, would require that if authorized researchers obtained knowledge of the 
identity of their research subjects — even by accident — they would be obliged 
not to reveal the identities under any circumstances. Most important, the act 
would prevent both police and any other authorities from seizing the data. 
When pushing forward the issue of “research confidentiality,” it will be im- 
portant to refer to the European legislation. 

A further proposal, or perhaps an alternative, discussed in one of the ad- 
visory reports, is for data stewards (Treuhdnder) to be appointed for the 
purpose of protecting the privacy of research subjects. Data stewards would 
be responsible for keeping records of the identity of subjects and would only 
pass data on to researchers for analysis with the identifying information re- 
moved. 

A more general recommendation given in the reports is that a “National 
Record Linkage Center” be set up to cover all fields in which record linkage 
is an issue. This has been proposed in part to avoid the duplication that 
would occur if each branch of social science made its own separate efforts. 
The German Data Forum (RatSWD) expressly abstains from making any 
specific recommendations, but believes that the proposal is worth detailed 
consideration. 
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Theme 9: Research Ethics 


This theme deals with two separate sets of ethical issues: the ethics of re- 
search using human subjects, and the ethics of scientists in publicizing their 
results. 


Research using human subjects 


The need to define and enforce ethical standards in research using human 
subjects has always been urgent and has become more so in view of the in- 
creasing availability of new types of data highlighted in this report: adminis- 
trative and commercial data, data from the Internet, geodata, and biodata. 

In practical terms, Germany does not yet have a detailed set of ethical 
requirements specifically designed to protect individuals who take part in 
research projects in the social sciences — a field typically concerned, of 
course, with the administration of surveys, and not human experiments. 
However, all researchers have to abide by the requirements of the Federal 
Data Protection Act. Additionally, the main professional associations in soci- 
ology and psychology have issued ethical guidelines, but these mainly affect 
behavior towards peers, rather than towards research subjects. 

A review of ethics procedures in the UK and the US was undertaken by 
an advisory report to see if they offered useful examples for Germany. Brit- 
ish procedures appear worth consideration; US procedures are perhaps too 
heavily geared towards the natural sciences. 

In the UK, beginning in 2006, the Economic and Social Research Coun- 
cil (ESRC), which is the main funding body for academic research, forced 
universities whose researchers were seeking funding from ESRC to set up 
ethics committees. In practice, committees have been put in place in all uni- 
versities, usually operating at the departmental or faculty level and not al- 
ways on a university-wide basis. The committees are required to implement 
six key principles, four of which protect human subjects. Subjects have to be 
fully informed about the purposes and use of the research in which they are 
participating; they have the right to be anonymous; the data they provide 
must remain confidential; participation must be voluntary, and the research 
must avoid harm to the subjects. 

The principle of “avoiding harm” is particularly important in view of the 
increasing availability of Web data, geodata, and biodata. “Avoiding harm” 
appears to be a principle of more practical relevance than the principle of 
“beneficence,” which German social scientists, borrowing from the biologi- 
cal sciences, have sometimes incorporated into ethical guidelines. 

Above all, given that research is conducted increasingly on the basis of 
international exchange, and research data are exchanged between different 
countries and national research institutions, it is of growing importance that 
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data-sharing organizations be able to rely on users to handle their data re- 
sponsibly. Due to differences in national data security regulations as well as 
in research ethics standards, this is a difficult task, which, at worst, can hin- 
der research. However, universal data protection rules are desirable, but 
extremely unlikely. Thus, it is important that, at a minimum, the scientific 
and statistical expert communities seek to foster the development of ethical 
standards which are then voluntarily adopted by those engaged in research 
and statistical work. 


Scientific responsibility in publicizing results 


A key set of ethical issues surrounds the responsibility of scientists in pub- 
lishing and publicizing their results. In a recent editorial in Science,” it is 
noted that “bridging science and society” is possible only if scientists behave 
properly — that is, in accordance with scientific standards. The editorial men- 
tions not just the need to avoid obvious scientific misconduct relating to data 
fraud or undisclosed conflicts of interest, but also the importance of avoiding 
“over-interpretation” of scientific results. 

It is worth noting that many economists appear to believe that over-inter- 
pretation (by simplifying results) is necessary if a scientist wants to reach the 
general public. The former Federal President of Germany, Mr. Koehler, an 
economist, appeared to endorse this approach by calling for social scientists 
to announce “significant” findings without burying important results under 
too many details. 

We believe that it would not be wise for social scientists to take this ad- 
vice, precisely because scientific results often become the subject of con- 
tentious public policy debates. Empirical results can have the effect of mak- 
ing policy debates more rational, but only if the assumptions underlying 
research and shortcomings that mar obtained results are communicated hon- 
estly. It is a duty of the scientific community to promote this type of honesty. 


Theme 10: Giving credit where credit is due 


A key principle of these recommendations is “to give credit where credit is 
due.” This principle" should apply to efforts at developing the social science 
research infrastructure just as much as to academic authorship. In general, 
valuable new infrastructural initiatives will only be launched if the staff of 
infrastructures under academic direction, of official statistical agencies — and 
perhaps of private-sector organizations that collect and provide data as well — 
feel recognized and rewarded for undertaking this important work. Junior 


10 Science, February 19, 2010, Vol. 327, 921. 
11 Nature, December 17, 2009, Vol 462, 825. 
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and senior staff of all types of organizations need to be clearly recognized for 
their important contributions. 

Existing academic conventions about “authorship” are not entirely satis- 
factory, nor are “science metrics” that evaluate the output of researchers, 
universities, and research institutes. In a recent article in Nature’? it is sug- 
gested: 


“Let’s make science metrics more scientific. To capture the essence of good science, stake- 
holders must combine forces to create an open, sound and consistent system for measuring 
all the activities that make up academic productivity. ... The issue of a unique researcher 
identification system is one that needs urgent attention.” 


Sometimes effective partnerships and joint investments by academic research 
institutes, official statistical agencies, and private fieldwork organizations 
occur despite seriously inadequate incentives and recognition. However, in 
order to make such collaborations more than rare events, the “rules of the 
game” must be changed. The establishment and running of infrastructure 
resources like biobanks, social surveys, and the Scientific Use Files of offi- 
cial resident registration data must be rewarded more adequately than at 
present. This applies to official statistics, public administrations, private 
organizations, and the entire scientific system. The German Data Forum 
(RatSWD) sees itself as one of the key players in promoting discussion and 
proposing effective steps on this issue. Here we want to mention two instru- 
ments that might help to ensure that credit is given where it is due. 

First, the establishment of a system for the persistent identification of 
datasets (e.g., the DOI system) would not only allow easier access to data, 
but also make datasets more visible and easily citable, thereby enabling the 
authors/compilers of the data to be clearly recognized. Even particular meas- 
urement “devices” (e.g., specific scales for the “Big Five” inventory) might 
be identified and citable by unique identifiers. A digital object identifier 
makes it easier to see the links between a scholarly article, the relevant data- 
sets, and the authors/compilers of the datasets. There are already some 
organizations that have assigned DOls to datasets (e.g., CrossRef and 
DataCite). 

Second, the issue of a unique researcher identification system is equally 
important and needs urgent attention. The recent launch of Open Researcher 
Contributor ID (ORCID) looks particularly promising. The use of a unique 
researcher ID makes the scientific contributions of each individual researcher 
who works on a dataset clearly visible. 


12 Nature, March 25, 2010, Vol. 464, 488-89. 
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Concluding remarks 


In Germany, there are several organizations for funding scientific research. Due 
to this “fragmented” funding environment, policy-makers, government officials, 
and senior researchers often believe that a more centralized organization would 
perform better. However, we at the German Data Forum (RatSWD) disagree. 
We are convinced that competition opens up more space for new ideas than 
would be available under a centralized system. 

Even though we do not support centralized organization of research, we 
nevertheless recognize an increasing need to provide long-term funding to 
establish and run large-scale social science infrastructure. Fortunately, the 
academic community, official statistical agencies, and government research 
institutes are thinking more than ever before about how to reorganize and 
finance infrastructure in research and statistics. So, for example, the German 
Council of Sciences and Humanities (WR, Wissenschaftsrat), and Germany’s 
Joint Science Conference (GWK, Gemeinsame WissenschaftsKommission) 
have working groups underway that are considering matters of research in- 
frastructure.** The discussions in these working groups have already made 
obvious that not only Research Data Centers and data archives but also more 
and more libraries — university and research institute libraries as well as cen- 
tralized specialist libraries (Fachbibliotheken) — are an important part of the 
research infrastructure, providing crucial data documentation and access 
services. The Federal Archive (Bundesarchiv) could also play a specific role. 
Nothing is settled yet. However, it is time to find a new and appropriate 
division of labor among these institutions. 

Many approaches will no doubt be considered, but in our view it is pref- 
erable to develop principles for funding and managing research infrastruc- 
ture, rather than to attempt the almost impossible task of formulating a de- 
tailed master plan. 

The German Data Forum (RatSWD) is itself neither a research organiza- 
tion nor a funding organization. It exists to offer advice on research and data 
issues. This places it in an ideal position to moderate discussions and help 
find the most appropriate funding arrangements for the social sciences." 


13 These are (in 2010) the “Research Infrastructure Coordination Group (Koordinierungs- 
gruppe Forschungsinfrastruktur)” and the “Working Group on a Research Infrastructure 
for the Social Sciences and Humanities (Arbeitsgruppe Infrastruktur fiir sozial- und geistes- 
wissenschaftliche Forschung)” of the German Council of Science and Humanities (WR, 
Wissenschaftsrat) as well as the “Commission on the Future of Information Infrastructure 
(KII, Kommission Zukunft der Informationsinfrastruktur)” of the Joint Science Conference 
by the Federal and Lander Governments (GWK, Gemeinsame Wissenschaftskonferenz des 
Bundes und der Lander). 

14 See also the “Science-Policy Statement on the Status and Future Development of the 
German Data Forum (RatSWD)” by the German Council of Science and Humanities (WR, 
Wissenschaftsrat). Schmollers Jahrbuch, 130 (2), 269-277. 
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